Clustering before training large datasets - Case study: K-SVD
نویسنده
چکیده
Training and using overcomplete dictionaries has been the subject of many developments in the area of signal processing and sparse representations. The main idea is to train a dictionary that is able to achieve good sparse representations of the items contained in a given dataset. The most popular approach is the K-SVD algorithm and in this paper we study its application to large datasets. The main interest is to speedup the training procedure while keeping the representation errors close to some specific values. This goal is reached by using a clustering procedure, called here T-mindot, which reduces the size of the dataset but keeps the most representative data items and a measure of their importance. Experimental simulations compare the running times and representation errors of the training method with and without the clustering procedure and they clearly show how effective T-mindot is.
منابع مشابه
A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملClustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers
In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...
متن کاملClustering Algorithms Optimizer: A Framework for Large Datasets
Clustering algorithms are employed in many bioinformatics tasks, including categorization of protein sequences and analysis of gene-expression data. Although these algorithms are routinely applied, many of them suffer from the following limitations: (i) relying on predetermined parameters tuning, such as a-priori knowledge regarding the number of clusters; (ii) involving nondeterministic proced...
متن کاملCLUST-SVD: Privacy preserving clustering in singular value decomposition
Large repositories of data contain sensitive information that must be protected against unauthorized access. The protection of the confidentiality of this information has been a long-term goal for the database security research community and for the government statistical agencies. Recent advances in data mining and machine learning algorithms have increased the disclosure risks that one may en...
متن کاملFuzzy clustering of time series data: A particle swarm optimization approach
With rapid development in information gathering technologies and access to large amounts of data, we always require methods for data analyzing and extracting useful information from large raw dataset and data mining is an important method for solving this problem. Clustering analysis as the most commonly used function of data mining, has attracted many researchers in computer science. Because o...
متن کامل